This document describes application of the PRS residualization approach to white subjects in the regards data. In particular, seven covariates are considered: alcohol use, gender, age, smoking, education (categorical), income, and weight. Age is modeled as a polynomial with a squared term. Eight outcomes (and their polygenic risk scores are considered): diastolic blood pressure (‘PGS000302’), glucose (‘PGS000684’), LDL (‘PGS000061’), systolic blood pressure (‘PGS000301’), total cholesterol (‘PGS000062’), triglycerides (‘PGS000066’), coronary artery disease (‘PGS000011’), and height (‘PGS000297’). All of these outcomes are continuous, with the exception of coronary artery disease. For CAD, Pearson residuals will be used.
First, models will be fit with all seven covariates, and the residuals will be analyzed for structure. Then, models will be fit holding out each of the covariates in turn. All sets of residuals from leave-one-out models will be assessed via PCA and k-means clustering. For dichotomous or trichotomous covariates (smoking, alcohol use, gender), particular attention will be paid to k-means clustering.
Subjects with missing data for any of the covariates, PGSs, or outcomes will be dropped from the analysis. This leaves us with a sample size of 1413 for a complete-case analysis.
As a case study, we first estimate the glucose multiple regression using all seven covariates. Some diagnostic plots are below. The multiple regression for glucose using the PGS and all seven covariates appears to deviate moderately from the modeling assumptions of linear regression. Given that we do not want to perform inference on the regression coefficients, this does not seem like a major issue. In fact, given that we may hope the residuals to have some type of ‘bimodal’ structure in some cases, it may in fact be preferrable that the residuals are not perfectly normal.
Investigate the variance inflation factor for the seven covariates.
## GVIF Df GVIF^(1/(2*Df))
## Gender_x 1.159287 1 1.076702
## poly(Age_x, degree = 2) 1.599706 2 1.124631
## Alc_Use 1.331945 2 1.074290
## Income 1.488465 1 1.220027
## Smoke 1.287013 2 1.065113
## ED_Cat 1.314812 3 1.046672
## Weight 1.437353 1 1.198897
Calculate the coefficient of determination for each of the models to give a sense of how predictive each covariate is. For the logistic CAD models, use AUROC.
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
Assess clustering of the residuals from the full model using the gap statistic to determine the preferred number of clusters. Plot the first two principal components and look for structure. Studentized residuals will be used; these residuals are also scaled to ensure unit variance.
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
Assess clustering of the residuals from the model without gender using the gap statistic to determine the preferred number of clusters. Plot the first two principal components and look for structure.
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## [1] "Adjusted rand index, no gender: 0.015"
## [1] "No gender table of clustering results"
##
## F M
## 1 258 488
## 2 307 360
Assess residuals from the model without smoking.
## Warning: did not converge in 10 iterations
## [1] "Adjusted rand index, no smoking: 0.007"
## [1] "No smoking table of clustering results"
##
## Current Never Past
## 1 80 231 239
## 2 75 289 227
## 3 41 92 139
Assess residuals from the model without alcohol.
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## [1] "Adjusted rand index, no alcohol: 0"
## [1] "No alcohol table of clustering results"
##
## Current Never Past
## 1 234 117 69
## 2 252 147 102
## 3 284 122 86
Assess residuals from the model without weight. There appears to be an oddly large number of duplicate values for weight - this warrants further investigation.
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
Assess residuals from the model without income.There appears to be very few unique values for income; this likely warrants further investigation.
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
Assess residuals from the model without age.
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
Assess residuals from the model without education. Compare 4-center k-means clustering to the four education categories with the adjusted Rand index.
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## [1] "Adjusted rand index, no education: 0.002"
## [1] "No education table of clustering results"
##
## College graduate and above High school graduate Less than high school
## 1 128 81 29
## 2 171 95 34
## 3 99 49 29
## 4 169 106 30
##
## Some college
## 1 104
## 2 95
## 3 76
## 4 118